19 research outputs found
An entropic generalization of Caffarelli’s contraction theorem via covariance inequalities
The optimal transport map between the standard Gaussian measure and an
-strongly log-concave probability measure is -Lipschitz,
as first observed in a celebrated theorem of Caffarelli. In this paper, we
apply two classical covariance inequalities (the Brascamp-Lieb and Cram\'er-Rao
inequalities) to prove a sharp bound on the Lipschitz constant of the map that
arises from entropically regularized optimal transport. In the limit as the
regularization tends to zero, we obtain an elegant and short proof of
Caffarelli's original result. We also extend Caffarelli's theorem to the
setting in which the Hessians of the log-densities of the measures are bounded
by arbitrary positive definite commuting matrices
Averaging on the Bures-Wasserstein manifold: dimension-free convergence of gradient descent
We study first-order optimization algorithms for computing the barycenter of
Gaussian distributions with respect to the optimal transport metric. Although
the objective is geodesically non-convex, Riemannian GD empirically converges
rapidly, in fact faster than off-the-shelf methods such as Euclidean GD and SDP
solvers. This stands in stark contrast to the best-known theoretical results
for Riemannian GD, which depend exponentially on the dimension. In this work,
we prove new geodesic convexity results which provide stronger control of the
iterates, yielding a dimension-free convergence rate. Our techniques also
enable the analysis of two related notions of averaging, the
entropically-regularized barycenter and the geometric median, providing the
first convergence guarantees for Riemannian GD for these problems.Comment: 48 pages, 8 figure
Sampling is as easy as learning the score: theory for diffusion models with minimal data assumptions
We provide theoretical convergence guarantees for score-based generative
models (SGMs) such as denoising diffusion probabilistic models (DDPMs), which
constitute the backbone of large-scale real-world generative models such as
DALLE 2. Our main result is that, assuming accurate score estimates,
such SGMs can efficiently sample from essentially any realistic data
distribution. In contrast to prior works, our results (1) hold for an
-accurate score estimate (rather than -accurate); (2) do not
require restrictive functional inequality conditions that preclude substantial
non-log-concavity; (3) scale polynomially in all relevant problem parameters;
and (4) match state-of-the-art complexity guarantees for discretization of the
Langevin diffusion, provided that the score error is sufficiently small. We
view this as strong theoretical justification for the empirical success of
SGMs. We also examine SGMs based on the critically damped Langevin diffusion
(CLD). Contrary to conventional wisdom, we provide evidence that the use of the
CLD does not reduce the complexity of SGMs.Comment: 30 page
Query lower bounds for log-concave sampling
Log-concave sampling has witnessed remarkable algorithmic advances in recent
years, but the corresponding problem of proving lower bounds for this task has
remained elusive, with lower bounds previously known only in dimension one. In
this work, we establish the following query lower bounds: (1) sampling from
strongly log-concave and log-smooth distributions in dimension
requires queries, which is sharp in any constant
dimension, and (2) sampling from Gaussians in dimension (hence also from
general log-concave and log-smooth distributions in dimension ) requires
queries, which is nearly sharp
for the class of Gaussians. Here denotes the condition number of the
target distribution. Our proofs rely upon (1) a multiscale construction
inspired by work on the Kakeya conjecture in harmonic analysis, and (2) a novel
reduction that demonstrates that block Krylov algorithms are optimal for this
problem, as well as connections to lower bound techniques based on Wishart
matrices developed in the matrix-vector query literature.Comment: 46 pages, 2 figure